Frontiers in Genetics — Latest Matching Preprints

1

A Foundational Exome Resource for Jordan: Dual Ancestry Admixture and Population-Specific Variants to Improve Clinical Variant Interpretation

Froukh, T.

2026-05-27 genetic and genomic medicine 10.64898/2026.05.23.26353895 medRxiv

Top 0.1%

26.0%

Show abstract

Currently, the genetic architecture of Middle Eastern populations is underrepresented in global genomic databases. This gap increases the rate of Variants of Uncertain Significance (VUSs) and clinical misinterpretations of genomic data especially in Middle Eastern populations. Whole exome sequencing was conducted on 90 healthy individuals from Jordan and the data were analysed using Principal Component Analysis (PCA) and multi-computational filtering. PCA revealed a double ancestry (EUR-AFR) admixture rather than a triple admixture (EUR-AFR-AMR). More than 3,500 populations-specific variants (PSVs) were identified, of which 72% were singletons. Additionally, 19 variants were significantly enriched compared to the maximum allele frequencies in public global databases (Fisher's exact test with Benjamini-Hochberg false discovery rate correction, p-value < 0.05). Consequently, the results suggest the reclassification of variants of Uncertain Significance (VUS) which reside in the ECE2 gene to likely benign and the variants of Conflicting Classification of Pathogenicity in the genes IL1RN and THPO to benign based on the significant allele frequency (AF=0.0389, p-value < 0.05). Furthermore, a pathogenic ClinVar variant was identified in a healthy individual, warranting careful interpretation. The findings underscore the importance of identifying PSVs in order to minimize or even prevent clinical misdiagnosis and highlight the unique genetic signature in Jordan. The study serves as a foundational resource for precision medicine in the region.

2

Deep analysis of FANTOM CAGE data reveals hierarchical patterns of TSS co-deployment hubs and their disruption in cancers

Meduri, R.; Satish, A. L.; Singh, U.

2026-05-18 genomics 10.64898/2026.05.15.725323 medRxiv

Top 0.2%

10.6%

Show abstract

Selective deployment of multiple transcription start sites is a major regulatory feature of human transcriptomes. FANTOM CAGE data exhibit a near-universal TSS deployment parsimony which is disrupted in cancers. We have recently shown that TSS deployment is sensitive to gene function, futile upstream transcription, and cellular biosynthetic states. Patterns in FANTOM CAGE data can reveal mechanisms underlying TSS co-deployments. We propose and test the possibility that some TSSs act like epromoters and act as co-varying hubs of transcriptional activities for multiple other promoters. Using deep analysis of CAGE data implemented through neural networks we show that non-cancers implement transcription co-deployments through cores of epromoter-like TSSs which are generally proximal to their start codons. These TSSs show enhancer-like TFBSs profiles. A comparison with cancer CAGE data shows that the concentrated epromoter core is disrupted in cancers with multiple distal TSSs replacing the proximal TSS cores. We provide evidence that the core TSSs are rich in YY1 and CTCF binding sites and associated with genes coding for transcription factors. Our findings show that covariance of TSS deployment is sensitive to transcriptional resource cost and a parsimonic design of TSS co-deployments depends on proximal TSSs in non-cancers, a mechanism grossly disrupted in cancers. HighlightsO_LIHeterogeneous FANTOM CAGE data contains universal patterns of TSSs co-deployments. C_LIO_LITSS co-deployments exhibit a parsimonious "core-covariant" scheme which is disrupted in cancers. C_LIO_LICore TSSs are enriched in transcription factor binding sites and gene functions which justify biological features of the samples. C_LIO_LIThe DL pipeline we present identifies the core-covariant TSS sets in an unbiased manner. C_LI

3

Evaluation of the Contribution of Natural Selection to Greater Cardiometabolic Disease Risk in South Asian Populations

Searby, D. J. C.; Hemani, G.; Chong, A.; Lawson, D. J.; Chaturvedi, N. J.; Davey Smith, G.

2026-05-22 genetic and genomic medicine 10.64898/2026.05.15.26353234 medRxiv

Top 0.3%

10.1%

Show abstract

A greater genetic susceptibility has been proposed as an explanation of the greater rates of cardiovascular and metabolic disease in South Asian relative to European populations. We first demonstrate that after accounting for technical artefacts the genetic effects for related traits are largely consistent between ancestral groups, which downplays the role of GxG or GxE interactions driving differential prevalence. If higher genetic susceptibility in South Asians is due to selective pressures acting through adiposity-related traits in the evolutionary past, signatures of selection should be evident at loci associated with cardiometabolic disease and other causally related traits (e.g. fat distribution). We tested for enrichment of several selection statistics (FST, XP-EHH and XP-nSL) at loci associated with a range of traits related to cardiometabolic disease, in comparison to a null distribution of linkage disequilibrium (LD) score and minor allele frequency (MAF) matched SNPs. Loci associated with a subset of these traits (Type 2 diabetes mellitus, trunk fat percentage, body fat percentage and trunk fat mass) exhibited enrichment for FST, consistent with a moderate adaptive explanation for their cross-population differentiation. In contrast, none of the studied traits were enriched for haplotype-based statistics, indicative that cross population genetic divergence is unlikely to have been driven by recent selective sweeps but has rather likely arisen from either ancient selection or recent polygenic selection acting on standing variation.

4

Genome-wide identification of rhabdoviral sequences in alfalfa (Medicago sativa L.)

Grinstead, S.; Nemchinov, L. G.

2026-05-22 genomics 10.64898/2026.05.20.726541 medRxiv

Top 0.3%

10.1%

Show abstract

We recently reported the identification of endogenous viral elements (EVEs) originating from the Caulimoviridae family within the alfalfa (Medicago sativa L.) genome. Our subsequent identification of ubiquitous rhabdoviral elements in infected and healthy alfalfa tissues by high throughput sequencing prompted us to suggest that the alfalfa genome might be populated with integrated rhabdoviruses as well. Bioinformatics analysis using 26 publicly available alfalfa genomes proved the suggestion accurate. We found multiple non-retroviral segments of the Rhabdoviridae family belonging to the genera Betanucleorhabdovirus and Betacytorhabdovirus that appeared to be stable constituents of the host genome. In that capacity they could potentially acquire functional roles in alfalfas development and response to environmental stresses. We believe this study reveals the first documented case of rhabdoviruses integrated into the alfalfa genome.

5

Psychometric Validation of the Education and Assessment of Genetic Literacy (EAGL) Measure

Barna, L. S.; Liao, Y.; Wierbicki, M.; Ramirez-Renta, G. M.; Kaphingst, K.; Gunter, C.

2026-05-26 genetics 10.64898/2026.05.22.727229 medRxiv

Top 0.4%

8.6%

Show abstract

Genetic literacy is an integral measure for examining societys interaction with genetics, but widely-used "genetic literacy" measures lack both knowledge comprehension measures and psychometric validation. To address these issues, we validated the Education and Assessment of Genetic Literacy measure (EAGL) in a sample of 2708 US participants, using both exploratory and confirmatory factor analysis. In addition to standard subjective and objective knowledge subscales, our measures distinct knowledge comprehension subscale focuses on autism as an example of a complex condition. Regression analyses showed a statistically significant interaction when looking at education and personal connection to autism in relation to knowledge comprehension (F=3.68, p=0.003). Separately, those in our sample with a connection to autism scored higher on the subjective knowledge section (F=19.52, p<0.001) only, concurring with previous demonstrations of a subjective-objective knowledge gap in science literacy. We explored geographic location as one potential factor in genetic literacy and found that metropolitan vs non-metropolitan status had no significant main effects on overall levels. After the validation process, we have two multi-domain measures which accurately capture the construct of genetic literacy and are available for wide use: the multi-faceted EAGL-long, which has previously been tested in thousands of participants, or the validated three-factor EAGL-short.

6

Transcriptomic profiling of embryo-derived cell lines from the Chagas disease insect vector Rhodnius prolixus

de Andrade Tavares, L.; Garcia, A. C.; Bell-Sakyi, L.; Fontenele de Brito, T.; Pane, A.

2026-05-12 genetics 10.64898/2026.05.08.723764 medRxiv

Top 0.5%

8.4%

Show abstract

Rhodnius prolixus is a primary insect vector of Trypanosoma cruzi, the causative agent of Chagas disease, a neglected parasitosis endemic in Latin American countries. It has been estimated that Chagas disease affects 7-8 million people worldwide and is responsible for approximately 1000 deaths per year. Genetic and molecular studies in this species remain challenging due to its life cycle and feeding habits, thus hindering the development of new strategies to control their populations and reduce the diffusion of Chagas disease. Recently, two stable cell lines - RPE/LULS53 and RPE/LULS57 - were derived from Rhodnius embryos, which represent promising new tools to investigate the genetics of this insect vector. Here, we describe their gene expression landscapes through transcriptomic approaches. We show that 8,968 expressed genes are shared between the two cell lines, whereas 391 and 1,088 genes are uniquely expressed in RPE/LULS53 and RPE/LULS57, respectively. Although key components of primary developmental, immune and redox signaling pathways are expressed in both cell lines, some genes such as Frizzled-10-a-like and catalase show marked differences in expression. Our results strongly suggest that RPE/LULS53 and RPE/LULS57 likely represent two different cell phenotypes. Consistent with this, gene ontology analysis reveals that RPE/LULS53 is enriched for animal organ morphogenesis and stress response, while RPE/LULS57 for DNA-directed RNA polymerase activity, among others. Despite these differences, both cell lines express comparable levels of transcripts from resident transposable elements, including the highly abundant Mariner and LINE/I elements, as well as horizontally transferred transposons. Our findings shed light on the nature of the RPE/LULS53 and RPE/LULS57 embryo-derived cell lines and provide valuable transcriptomic resources for future genetic and functional studies in Rhodnius and other triatomine insect vectors. Author summaryRhodnius prolixus is a blood-feeding insect and a major vector of Chagas disease, a parasitosis endemic in Latin America and affecting millions of people worldwide. In the absence of effective drugs and vaccines, the control of the insect population represents a promising strategy to reduce the diffusion of the disease. Yet, genetic and functional studies in Rhodnius are extremely challenging due to its feeding habit and life cycle. To overcome these limitations, researchers have previously developed two stable cell lines derived from Rhodnius embryos. In this study, we provide the first characterization of the genes expressed in these cell lines. We found that, while the two cell lines share many expressed genes, each of them also has distinct gene expression patterns pointing to two different cell types with specialized functions. These differences likely affect the way they respond to stress and regulate biological processes. Our findings provide an important resource for researchers studying Rhodnius prolixus and other insect vectors, helping advance our understanding of the genetic and molecular mechanisms that control the insect development and mediate the interactions between insect vectors and the parasites they transmit

7

Conditional and marginal SNP-heritability to leverage ancestral and environmental diversity

Singh Sachan, A. N.; Schwartzman, A.; Azriel, D.

2026-05-29 genetics 10.64898/2026.05.28.728536 medRxiv

Top 0.5%

8.2%

Show abstract

SNP-heritability is defined as the fraction of variance of a trait that is explained by the SNPs in a genome-wide association study. Several methodologies have been proposed to estimate this quantity. More recent methods aim to do so with ancestrally diverse datasets and yet obtain a single heritability for an entire dataset, which we refer to as marginal heritability. However, the different underlying subpopulations that compose a genetically diverse dataset might have different environmental and genetic exposures, and thus may have different heritabilities. In order to address this, we propose a conditional SNP-heritability approach that allows to estimate multiple SNP-heritabilities on a dataset corresponding to different ancestral compositions and environmental exposures. We take a careful statistical approach, including estimation of conditional genetic and environmental variances, and calculation of standard errors via a combination of the delta method with bootstrapping. We validate our method via extensive simulations. We then apply it to an ancestrally and socio-economically diverse dataset of 6603 subjects aged around 9 to 11 from the Adolescent Brain Cognitive Development study, and illustrate how the SNP-heritability of intelligence scores can change due to differing extrinsic variances in different socio-economic groups, which coincides with previous work in the literature. This conditional estimation approach can be a valuable tool for understanding differences in risks across subpopulations. Our work here improves on existing methodology and allows us to leverage the heterogeneity of the data to obtain new insights.

8

Genome-wide computational prediction of miRNAs encoded by influenza A virus (H3N2) predicts target genes involved in pulmonary and antiviral innate immunity

Siddiqi, M. A.; Kumar, H.; Mazumder, M.

2026-05-18 bioinformatics 10.64898/2026.05.18.725090 medRxiv

Top 0.6%

6.9%

Show abstract

Influenza A virus (IAV) causes significant morbidity and mortality worldwide. Understanding how viral RNAs may regulate host genes through microRNA-like mechanisms can clarify pathogenesis and reveal therapeutic targets. In this study, we screened all eight IAV H3N2 RNA segments (PB2, PB1, PA, HA, NP, NA, M, and NS) using an ab initio computational pipeline; five segments (PB2, PB1, PA, HA, and M) met the VMir scoring threshold for further analysis, while NP, NA, and NS were excluded due to low pre-miRNA scores. Mature miRNAs were identified using MatureBayes, and target genes in the human genome were predicted with the miRDB server. From these targets, we selected two genes per qualifying segment (10 genes total) based on their functional relevance to influenza infection and supporting literature; all selected genes are unique to their respective segment. We identified 10 segment-specific target genes (IFNL1, DDX60, SAMHD1, MAVS, IRF4, BIRC2, AGO1, MAP3K1, NOD1, and TNFAIP1) and one common target across all five analyzed segments (CADM2). Gene Ontology and pathway analyses showed enrichment in interferon signaling, RIG-I-like receptor pathways, antiviral restriction, RNA interference, and inflammatory responses. Literature supports roles for these genes in pulmonary and antiviral innate immunity. Our findings provide a basis for experimental validation and may help the research community better understand influenza virus pathogenesis and identify novel therapeutic candidates. GRAPHICAL ABSTRACT O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=111 SRC="FIGDIR/small/725090v1_ufig1.gif" ALT="Figure 1"> View larger version (33K): org.highwire.dtl.DTLVardef@2b14adorg.highwire.dtl.DTLVardef@5a9b2eorg.highwire.dtl.DTLVardef@81ffc1org.highwire.dtl.DTLVardef@be119b_HPS_FORMAT_FIGEXP M_FIG C_FIG

9

Rare genetic variants in the IIS/mTOR signalling pathway identified in exceptionally long-lived individuals show shared in vitro effects associated with lifespan across species

Neuerburg, M.; Smulders, L.; van den Akker, E. B.; Kolbe, D.; Artoni, F.; Brusius, I.; Hinterding, H.; Beltrame, L.; Pahl, R.; Suchiman, H. E. D.; Papadakis, A.; Beyer, A.; Beekman, M.; Nebel, A.; Slagboom, P. E.; Baghdadi, M.; Deelen, J.

2026-05-28 genetics 10.64898/2026.05.28.728260 medRxiv

Top 0.6%

6.9%

Show abstract

BackgroundThe increase in human lifespan without a proportional increase in healthspan imposes a substantial burden on individuals and society. Exceptionally long-lived individuals and members of long-lived families exhibit compression of multi-morbidity. Genetics, and in particular rare protein-altering variants, appear to play an important role in their longevity. MethodsIn this study, we employed a targeted pathway approach to provide functional evidence of the significance of rare variants in the insulin/insulin-like growth factor 1 signalling - mechanistic target of rapamycin (IIS/mTOR) signalling pathway identified in long-lived individuals. To this end, we used CRISPR/Cas9 to introduce these rare genetic variants into mouse embryonic stem cells (mESCs). We subsequently assessed several functional readouts that have previously been associated with lifespan regulation in model organisms and/or IIS/mTOR and mitogen-activated protein kinase/extracellular signal-regulated kinase (MAPK/ERK) signalling pathway activity. ResultsFunctional characterisation revealed that the variants exhibit both shared and distinct effects on the signalling pathways. Principal component analysis of omics-based datasets showed that the variants clustered into two groups, a distribution that corresponds with the grouping observed for a subset of functional readouts. All variant mESC lines exhibited a downregulation in IIS/mTOR and MAPK/ERK signalling pathway activity as well as an increase in Foxo3 expression and FOXO3 binding activity. We identified alterations in lipid and mitochondrial metabolism, including a reduction in mitochondrial DNA levels, which were mostly shared among all variants. All variant mESC lines exhibited a signature implying increased pluripotency. The effects on stress resistance and growth rate diverged between the two variant groups, with partially opposing effects. Group 1 demonstrated a reduced growth rate and increased resistance to a subset of stressors, while Group 2 demonstrated an increased growth rate and reduced resistance to a subset of stressors. ConclusionsHere, we provide evidence that rare genetic variants in the IIS/mTOR and MAPK/ERK signalling pathways identified in long-lived human individuals result in shared functional effects associated with longevity in model organisms. These insights can serve as a foundation to better understand the role of rare variants in the insulin signalling network in the regulation of human longevity. O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=68 SRC="FIGDIR/small/728260v1_ufig1.gif" ALT="Figure 1"> View larger version (18K): org.highwire.dtl.DTLVardef@1bf5ebdorg.highwire.dtl.DTLVardef@e4e5dcorg.highwire.dtl.DTLVardef@1aee276org.highwire.dtl.DTLVardef@95f170_HPS_FORMAT_FIGEXP M_FIG C_FIG

10

Gene model for the ortholog of Lst8 in Drosophila yakuba

Lawson, M. E.; Sanow, K. A.; Chetana, K.; Taylor, E.; Morgan, A.; Flannery, D.; Elsie, C.; Rele, C. P.; Reed, L. K.; O'Rourke, K. S.

2026-05-14 genomics 10.64898/2026.05.12.723325 medRxiv

Top 0.6%

6.8%

Show abstract

Gene model for the ortholog of Lst8 (Lst8) in the May 2011 (WUGSC dyak_caf1/DyakCAF1) Genome Assembly (GenBank Accession: GCA_000005975.1) of Drosophila yakuba. This ortholog was characterized as part of a developing dataset to study the evolution of the Insulin/insulin-like growth factor signaling pathway (IIS) across the genus Drosophila using the Genomics Education Partnership gene annotation protocol for Course-based Undergraduate Research Experiences.

11

Inferring the demographic history of Chinese and Indian rhesus macaque (Macaca mulatta) populations from PacBio HiFi long-read sequencing data

Heenkenda, E. J.; Versoza, C. J.; Terbot, J. W.; Soni, V.; Spatola, G. J.; Pfeifer, S. P.; Jensen, J. D.

2026-05-26 evolutionary biology 10.64898/2026.05.25.727731 medRxiv

Top 0.6%

6.8%

Show abstract

The rhesus macaque (Macaca mulatta) is one of the most widely used animal models in biomedical research, both as it resembles humans in key biological aspects and as it is characterized by a broad geographic range. Most of the individuals housed in U.S. research colonies have been sampled from either China or India, though notably the source population of these animals has significantly shifted over time. Given the substantial genetic and immunological differences between these populations, a deeper understanding of the underlying population structure is critically important for biomedical interpretation. Despite this, the demographic histories of these two populations remain poorly resolved. Here, we present an analysis of whole-genome, PacBio HiFi long-read sequencing data from ten unrelated individuals of each population, applying four related model- and non-model based demographic inference approaches, in order to reconstruct their ancestral history. We evaluated the fit of the subsequently estimated models against the empirical data, and incorporated underlying uncertainty in the mutation rates used for scaling. We inferred a well-fitting population history characterized by substantial structure between Chinese and Indian populations, with a split time [~]140,000 generations ago from an ancestral population of [~]65,000 individuals. We additionally inferred the subsequent history of size change within, and gene flow between, these populations, reaching the current estimated sizes of [~]220,000 individuals in the Chinese population and [~]14,000 individuals in the Indian population. The robust baseline demographic model established in this study will serve as a valuable resource for future research on this species, including for improved fine-scale recombination mapping, selection inference, and association studies.

12

A Putative Single-Locus Determinant of the Suppressed In Ovo Virus Infection (SOV) Trait in Apis mellifera

Lefebre, R.; Broeckx, B. J. G.; De Smet, L.; Braeckman, M.; Gregorc, A.; Peelman, L.; de Graaf, D. C.

2026-05-29 genomics 10.64898/2026.05.28.728461 medRxiv

Top 0.7%

6.4%

Show abstract

Today, the deformed wing virus (DWV) can be considered as one of the major causes of global elevated western honey bee colony losses (Apis mellifera). Virus transmission may occur horizontally between individuals of the same generation, but also vertically from parents to offspring. The recently defined heritable suppressed in ovo virus infection (SOV) trait describes the absence of viruses in pooled drone eggs of a queen, associated with significant lower DWV prevalence and viral loads in the subsequent developmental offspring stages. By definition, the trait reflects the absence of vertical virus transmission from SOV-positive (SOV+) queens themselves to their offspring. However, the genetic basis influencing this heritable virus resilience has not been identified yet. In this study, we aimed to identify SOV-associated genetic marker(s) or loci in the honey bee genome through genome-wide variant comparison of 44 DWV-positive and 44 DWV-negative drone pupae descendent from an artificially created hybrid SOV+/SOV- colony. After whole genome sequencing (WGS), variant calling, and genotype-phenotype association analysis by means of single marker tests and elastic net regression, one variant in a locus of 241.246 bp on chromosome 7 that contained 17 other highly SOV-associated variants classified 68,2% of the drone phenotypes correctly. These results may support the potential application of marker-assisted selection (MAS) strategies targeting reduced vertical virus transmission in honey bees.

13

Integrative Genomic Analyses Identify COL21A1 and ENPEP-FGF5 Regulatory Pathways for Blood Pressure Variation in East Asians

LAU, Z. C.; Chang, X.; Sim, K. S.; Wu, H.; Naaz, A.; Muniasamy, U.; Khor, C.-C.; Koh, W.-P.; Vitaly, S.; Dorajoo, R.

2026-05-18 genetics 10.64898/2026.05.14.725285 medRxiv

Top 0.7%

6.4%

Show abstract

BackgroundHypertension is a highly heritable cardiovascular disorder and a major determinant of cardiometabolic disease, including diabetes. However, the regulatory genes and tissue-specific mechanisms underlying blood pressure variations remain incompletely understood. MethodsLeveraging a well-characterized prospective population-based cohort comprised of 27,308 participants from the Singapore Chinese Health Study (SCHS), we evaluated genome-wide genetic associations for five blood pressure traits: hypertension status, systolic blood pressure, diastolic blood pressure, mean arterial pressure (MAP) and pulse pressure (PP). Additionally, we conducted a transcriptome-wide association study (TWAS), integrating gene expression data from 49 tissues, followed by colocalization and fine-mapping to prioritize regulatory genes. Association of identified variants with incident diabetes was additionally evaluated in the longitudinal data. ResultsWe validated 10 blood pressure loci (P between 1.64 x 10-20 - 4.10 x 10-8) and identified an East-Asian specific splice donor variant at the COL21A1 gene associated with PP (rs149344559, P = 6.78 x 10-10). Integrative analyses prioritized FGF5 in kidney cortex and ENPEP in pituitary tissue as candidate regulatory genes. The blood pressure-lowering allele at ENPEP (T allele, rs1879056) was associated with reduced risk of incident diabetes. Mediation analysis demonstrated that approximately 21% of the genetic association with diabetes was mediated through MAP (Pindirect-effect = 2 x 10-16). ConclusionThis study refines genetic predispositions for blood pressure among East-Asians. We further delineate tissue-specific regulatory pathways underlying blood pressure variations and identify ENPEP-mediated dysfunctions linking blood pressure genetics to diabetes risk, underscoring integrated disease mechanisms.

14

Promises and limitations of local ancestry inference in imputed ancient genomes

Bougiouri, K.; Irving-Pease, E. K.; Frantz, L. A. F.; Racimo, F.; Petr, M.

2026-05-20 evolutionary biology 10.64898/2026.05.19.725905 medRxiv

Top 0.9%

6.3%

Show abstract

Recent advances in genome imputation have enabled the application of state-of-the-art statistical methods--originally developed for present-day genomes--to ancient genomes. One class of such methods, known as local ancestry inference (LAI), can model an individuals genome as a mosaic of tracts assigned to different putative ancestral sources, revealing patterns of genetic ancestry across the genome. However, most LAI methods have been designed to study recent admixture events in human history, and they generally assume large panels of present-day genomes. Despite the recent availability of high-quality imputed ancient genomes, it remains unknown to what degree LAI inference is reliable for such datasets. Ancient DNA is often characterized by heterogeneous geographic and temporal sampling, varying degrees of divergence between ancient source proxies and admixing populations, and complex demographic histories. Here, we performed an extensive set of population genetic simulations to evaluate the accuracy of four popular LAI methods-RFMix, FLARE, MOSAIC and simpLAI-under different demographic scenarios, various temporal sampling schemes, sample sizes, and admixture dates. We quantify the accuracy of these methods as a function of different parameters in practically relevant scenarios, and provide general guidelines for future studies utilizing LAI in ancient DNA research.

15

Identification of septoria nodorum blotch susceptibility genes in hard winter wheat

Ara, A. M.; Holmes, D. J.; Friesen, T. L.; Carver, B. F.; Bai, G.; St. Amand, P.; Bernado, A.; Sharma, R.; Aoun, M.

2026-05-15 genetics 10.64898/2026.05.13.724689 medRxiv

Top 0.9%

6.3%

Show abstract

Key message Characterized and unknown septoria nodorum blotch susceptibility/resistance genes were identified in contemporary U.S. hard winter wheat. The necrotrophic fungus Parastagonospora nodorum is the causal agent of septoria nodorum blotch (SNB) of wheat. To determine the prevalence of SNB sensitivity genes in a contemporary U.S. hard winter wheat (HWW), we evaluated a panel of 619 breeding lines and cultivars against five P. nodorum isolates and five necrotrophic effectors (NEs), SnToxA, SnTox1, SnTox3, SnTox267 and SnTox5, and genotyped the panel using genotyping-by-sequencing (GBS) markers and diagnostic Kompetetive-allele specific PCR (KASP) markers for the sensitivity genes Tsn1-B1, Snn1-B1, and Snn3-B1/B2. GBS analysis identified 34,357 GBS-single nucleotide polymorphism (SNP) markers. Evaluations against P. nodorum isolates showed that 40-67% of the genotypes were susceptible in the panel. Toxin infiltration assays showed that 54%, 2%, 37%, 13%, and 15% of the genotypes were sensitive to SnToxA, SnTox1, SnTox3, SnTox267, and SnTox5, respectively. Diagnostic KASP markers for Tsn1-B1, Snn1-B1, and Snn3-B1/B2 showed prediction accuracies of 98%, 75%, and 92% for the corresponding effectors SnToxA, SnTox1, and SnTox3, respectively. Genome-wide association studies (GWAS) not only confirmed the presence of the previously characterized sensitivity genes Tsn1-B1, Snn1-B1, Snn2, Snn3-B1/B2, and Snn5-B1, but also identified new loci to be associated with responses to P. nodorum isolates and NEs. Of which, Qsnb.osu-2AS on chromosome 2AS was associated with responses to all five isolates. We developed KASP markers KASP_S4B_643615365, KASP_ S2D_16184991, and KASP_S2A_9833162 linked to Snn5-B1, Snn2, and Qsnb.osu-2AS, respectively. These findings should guide breeding for SNB resistance in hard winter wheat.

16

The contribution of non-additive genetic effects to the genetic variance of polyploid species.

Clo, J.

2026-05-14 genetics 10.64898/2026.05.12.724556 medRxiv

Top 0.9%

6.3%

Show abstract

Whole genome duplication is a common mutation in eukaryotes with far-reaching phenotypic effects. The resulting morphological, physiological, and fitness consequences and how they affect the survival probability of newly polyploid lineages are intensively studied, but very little is known about the effect of genome doubling on the short-term evolvability of populations. Understanding the effect of polyploidization on the adaptive potential of populations is of crucial importance to predict the future of polyploid populations. In this paper, I investigate the immediate consequences of genome doubling on the genetic variance of populations. To do so, I performed numerical iterations and simulations of how the genetic variance of a quantitative trait changes after polyploidization, under different genetic architectures (additivity, dominance, and epistasis). I found that genetic variance generally decreases after genome doubling. Non-additive gene actions can make autotetraploid populations genetically more diverse than their diploid progenitors in rare cases, notably with overdominance and directional epistasis. By collecting estimates from the agronomic literature, I found that both dominance and epistatic variance contribute to the genetic variance of polyploid populations. These results bring new insights into the adaptive potential of newly formed tetraploid populations, and call for further experimental investigations of how polyploidization is associated with a short-term decrease in evolvability.

17

A novel matrix multiplication framework for modeling genotype-by-environment interaction in genomic prediction

Montesinos-Lopez, O. A.; Montesinos-Lopez, A.; Montesinos-Lopez, J. C.; Crossa, J.; Dreisigacker, S.; Hernandez-Suarez, C. M.; Ortiz, R.

2026-05-15 genetics 10.64898/2026.05.11.724414 medRxiv

Top 0.9%

6.2%

Show abstract

Accurate modeling of genotype-by-environment (GxE) interaction is critical for genomic prediction in plant breeding but remains challenging due to complex interaction structures. Conventional models often use the Hadamard product of genotype and environment covariance matrices to capture joint similarity, which may not fully represent GxE complexity. Here we propose a novel framework that derives covariance structures from the matrix multiplication of genotype and environment kernels, decomposing these into symmetric components incorporated as random effects in mixed models. Evaluated for 11 wheat and rice multi-environment datasets and across, this approach consistently outperformed the traditional Hadamard-based model, improving prediction accuracy by up to 13.2% in Pearsons correlation and enhancing top-selection accuracy. Combining both methods yielded the highest performance, indicating complementary information capture. This framework offers a flexible, interpretable, and computationally feasible extension for modeling GxE interaction, potentially enhancing genomic selection effectiveness under diverse environmental conditions.

18

Using combined RNA/DNA short read sequencing to investigate allele-specific expression from the inactive X chromosome in human cells

Thomas, R.; Blower, M.

2026-05-24 bioinformatics 10.64898/2026.05.21.726886 medRxiv

Top 0.9%

6.2%

Show abstract

Many genomic regions exhibit allele-specific expression. This effect is most pronounced in imprinted genes, where one copy of a gene is epigenetically silenced, and the inactive X chromosome of female cells, where almost the entire chromosome is silenced. Allele specific gene expression can have significant effects on human health and is implicated in a wide array of diseases. Research into allele specific expression is most often carried out in mouse models where cross breeding of mouse strains can yield progeny with well characterised haplotypes where parent of origin is known for a huge number of SNPs. The same approach cannot be taken with human data and haplotypes must be assembled using expensive and labour intensive long read sequencing and Hi-C based approaches. Although resolved haplotypes are available for a number of cell lines, allowing accurate measurement of allele-specific gene expression, this type of analysis is inaccessible for non-specialist labs. We demonstrate how to use previously published haplotypes to investigate X linked gene silencing and epigenetic changes. Additionally, in this paper we present a method to exploit the profound difference in expression levels between the two human X chromosomes to assign SNPs in expressed RNA to the active or inactive X chromosome using only short read DNA and RNA sequencing. We demonstrate this technique using sequencing libraries generated in house and sequencing data from publicly available databases including for a cell line with a complex karyotype. In each instance we identified genes that were silenced in each cell line opening them up to further research avenues. This X chromosome haplotyping technique can be applied to any clonally derived human cell line with 2 or more X chromosomes allowing researchers to investigate X linked gene silencing in cell lines already present in their lab rather than in the limited number of cell lines for which a haplotype is available.

19

Circular RNA-associated QTLs show stronger association with splicing-QTLs than with expression-QTLs

Zabala, A.; Ascension, A. M.; Iniguez, S. G.; Iparraguirre, L.; Andres-Leon, E.; Matesanz, F.; Otaegui, D.; Munoz-Culla, M.

2026-05-29 genetics 10.64898/2026.05.29.728707 medRxiv

Top 1.0%

6.1%

Show abstract

IntroductionCircular RNA quantitative trait loci (circQTLs) have emerged as a class of regulatory variants, but their mechanistic basis remains poorly characterized. Understanding how genetic variation influences circRNA biogenesis is essential to clarify their role in post-transcriptional gene regulation. MethodsWe systematically compared circQTLs with matched splicing (sQTL) and expression (eQTL) datasets. Using bootstrap-based Jaccard similarity analyses, we quantified genomic overlap patterns and assessed their statistical significance. We further validated these findings across independent circQTL studies. In addition, we analyzed the genomic distribution of circQTLs to identify enrichment patterns across functional genomic regions. ResultscircQTLs exhibited a statistically significant but modestly stronger genomic overlap with sQTLs compared to eQTLs. This pattern was consistent across independent datasets despite limited reproducibility of individual circQTL signals. Genomic annotation revealed distinct distributional patterns, including depletion in exonic regions and relative enrichment in non-coding genomic contexts compared to other QTL classes. DiscussionTogether, these results suggest that circRNA-associated regulatory variation is preferentially linked to splicing-related mechanisms rather than transcriptional control of host genes. However, the modest effect size indicates that this relationship is not exclusive, and likely reflects a mixture of shared splice-site regulatory effects and additional mechanisms specific to back-splicing that are not captured by conventional sQTL or eQTL frameworks. This dual architecture positions circRNA biogenesis at the interface between splicing dynamics, RNA structure, and higher-order genomic organization, supporting circQTLs as a distinct layer of post-transcriptional gene regulation.

20

Alternative polyadenylation and the sex-specific gene expression program in hemp

Shivakumar, A.; Hunt, A. G.; Chakrabarti, M.

2026-05-17 plant biology 10.64898/2026.05.13.725035 medRxiv

Top 1.0%

5.1%

Show abstract

Hemp (Cannabis sativa) produces a wide array of medicinally significant compounds, including cannabidiol (CBD). These compounds are predominantly synthesized in female hemp inflorescences. The proposed research utilizes next-generation sequencing-based transcriptome analysis using a 3{square}-end-directed approach to identify differentially expressed genes between male and female hemp plants at the early vegetative stage. 886 differentially expressed genes (DEGs) were identified, a majority of which were upregulated in males compared to females. We hypothesized that alternative RNA processing contributes to sex-specific gene expression. To this end, 932 genes were identified that exhibited significant changes in poly(A) site usage when comparing males and females. These genes were much more likely to be differentially expressed, supportive of this hypothesis. Males tend to have longer 3 UTRs with canonical motifs found in the Near-Upstream Elements (NUE), compared to the shorter 3 UTRs in females, which have A-rich motifs near the cleavage site. This suggests that polyadenylation remodels hemp mRNAs with distal poly(A) sites being preferred in males. To further investigate when this sex-specific gene expression program is established, RNA was isolated from plants at various developmental stages, such as developing seeds, four-day-old seedlings, and different developmental stages up to four weeks after sowing. Diagnostic male-specific genes were analyzed using RT/PCR. The results indicate that sex-specific gene expression is not evident in seeds but rather is set during or after germination. SignificanceO_LIHemp males tend to have longer 3 UTRs with canonical motifs found in the Near-Upstream Elements (NUE), compared to the shorter 3 UTRs in females, which have A-rich motifs near the cleavage site. C_LIO_LIThe sex-specific gene expression program is not yet established in mature seed but is set in the time between germination and 4 days of growth. C_LI